{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "ModelValidation & CrossValidation.ipynb",
      "provenance": [],
      "authorship_tag": "ABX9TyMDt0vaerCP3T/pbeb01DWL",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/Divyanshu-ISM/Machine-Learning-Deep-Learning/blob/main/ModelValidation_%26_CrossValidation.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "--WDLnK6b93M"
      },
      "source": [
        "# Understanding Model Validation"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "L3g5YZXRZi6s"
      },
      "source": [
        "import numpy as np\r\n",
        "import pandas as pd\r\n",
        "import seaborn as sns\r\n",
        "import matplotlib.pyplot as plt"
      ],
      "execution_count": 1,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "xlK9sUwDZzBh"
      },
      "source": [
        "from sklearn.datasets import load_iris\r\n",
        "\r\n",
        "iris = load_iris()"
      ],
      "execution_count": 2,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "NzNzitVdaO2T"
      },
      "source": [
        "X = iris.data\r\n",
        "y = iris.target\r\n",
        "Xcols = iris.feature_names"
      ],
      "execution_count": 5,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "R0YyGcDJaVdo"
      },
      "source": [
        "df = pd.DataFrame(data = X, columns=Xcols)\r\n",
        "df['Species'] = y"
      ],
      "execution_count": 10,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FW5qy2P3bIJ7"
      },
      "source": [
        "# 1. K-Nearest Neighbors : 5 Fold Cross Validation"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "BuFiB9GhaWIQ"
      },
      "source": [
        "from sklearn.neighbors import KNeighborsClassifier"
      ],
      "execution_count": 14,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "yyCg4i5ja6JQ"
      },
      "source": [
        "knn = KNeighborsClassifier(n_neighbors=5)"
      ],
      "execution_count": 15,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "a1CM9R59b53t"
      },
      "source": [
        "### Train-Test Split\r\n",
        "\r\n",
        "> This method validates the Model based on a chunk of the Data that it has never seen (Test Data)."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "EU64TuFibqre"
      },
      "source": [
        "from sklearn.model_selection import train_test_split"
      ],
      "execution_count": 16,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "e9DlCHTjcfhw"
      },
      "source": [
        "X1,X2,y1,y2 = train_test_split(X,y,train_size=0.5)"
      ],
      "execution_count": 18,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "3He9_yg1dmlU",
        "outputId": "1758385c-d0b7-4e69-aa82-53c968dc5cff"
      },
      "source": [
        "knn.fit(X1,y1)"
      ],
      "execution_count": 20,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',\n",
              "                     metric_params=None, n_jobs=None, n_neighbors=5, p=2,\n",
              "                     weights='uniform')"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 20
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "k1nKaT4He53t",
        "outputId": "a128e1f7-be4f-4b62-98c2-b4a95c66916c"
      },
      "source": [
        "from sklearn.metrics import accuracy_score\r\n",
        "yp2 = knn.predict(X2)\r\n",
        "\r\n",
        "accu = accuracy_score(yp2,y2)\r\n",
        "\r\n",
        "accu"
      ],
      "execution_count": 22,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "0.96"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 22
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UPbilxY7fSkt"
      },
      "source": [
        "## Cross Validation \r\n",
        "\r\n",
        "> This methodology splits the data into various chunks, and then keeps one chunk for testing, and other (rest) for training. \r\n",
        "\r\n",
        "> Final score is the average of all such splits."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RNG3hoomf1oy"
      },
      "source": [
        "A) Cross-Validation from Scratch."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "t88TAES7hKTM"
      },
      "source": [
        "> 2-Fold Cross Validation"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "YegpF_dWfPIh",
        "outputId": "1e5ca8ef-1768-43ed-e366-69b56ac9a11d"
      },
      "source": [
        "model = KNeighborsClassifier(n_neighbors=5)\r\n",
        "X1,X2,y1,y2 = train_test_split(X,y,train_size=0.5)\r\n",
        "\r\n",
        "#Model 1 : Train on X1,y1 and Test on X2,y2\r\n",
        "model.fit(X1,y1)\r\n",
        "yp2 = model.predict(X2)\r\n",
        "acc2 = accuracy_score(yp2,y2)\r\n",
        "\r\n",
        "#Model 2 : Train on X2,y2 and Test on X1,y1\r\n",
        "model.fit(X2,y2)\r\n",
        "yp1 = model.predict(X1)\r\n",
        "acc1 = accuracy_score(yp1,y1)\r\n",
        "\r\n",
        "#Scores List\r\n",
        "scores = [acc2,acc1]\r\n",
        "\r\n",
        "#Final Score : Cross-Val Score : A more representative model performance (Accuracy)\r\n",
        "cv_score = np.mean(scores)\r\n",
        "\r\n",
        "cv_score"
      ],
      "execution_count": 24,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "0.9733333333333334"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 24
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "S0FBY_vmg-9Q"
      },
      "source": [
        "We can Clearly see that there is a Differnce between this accuracy measure and the previously evaluated measure. We trust this one more. "
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oDCYIECShNOd"
      },
      "source": [
        "B) 5-Fold Cross-Validation using Sk-learn"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "cpipGdKDg7Gi"
      },
      "source": [
        "from sklearn.model_selection import cross_val_score"
      ],
      "execution_count": 25,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "xW0w8iHYhXEa",
        "outputId": "76d58c43-be4e-4c55-c886-32115e07ae03"
      },
      "source": [
        "model = KNeighborsClassifier(n_neighbors=5)\r\n",
        "scores_cv = cross_val_score(estimator=model,X=X,y=y,cv=5)\r\n",
        "scores_cv"
      ],
      "execution_count": 26,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "array([0.96666667, 1.        , 0.93333333, 0.96666667, 1.        ])"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 26
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "hgTNgTYCh5ML"
      },
      "source": [
        "final_accuracy = np.mean(scores_cv)"
      ],
      "execution_count": 27,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "F78jEpR2h8qR",
        "outputId": "3d6f89f8-b8d8-4afc-cf09-e68877c3da67"
      },
      "source": [
        "final_accuracy"
      ],
      "execution_count": 28,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "0.9733333333333334"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 28
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "P95B0yMLh-je"
      },
      "source": [
        "#97.3% is the actual model Accuracy. "
      ],
      "execution_count": 29,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "13qZzapjiGTx"
      },
      "source": [
        "# 2. RandomForest Classifier"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ZZrZsHcziDdx"
      },
      "source": [
        "from sklearn.ensemble import RandomForestClassifier"
      ],
      "execution_count": 30,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "fvcE-LHgiO7A",
        "outputId": "f6f8f345-5a39-4356-89e2-6ae5ac225dad"
      },
      "source": [
        "forest = RandomForestClassifier()\r\n",
        "forest_cvscores = cross_val_score(forest,X,y,cv=5)\r\n",
        "\r\n",
        "forest_cvscores"
      ],
      "execution_count": 39,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "array([0.96666667, 0.96666667, 0.93333333, 0.96666667, 1.        ])"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 39
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "w0upSxfWieRn",
        "outputId": "fdd896e8-ee56-4259-a325-d797e4fe6368"
      },
      "source": [
        "np.mean(forest_cvscores)"
      ],
      "execution_count": 40,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "0.9666666666666668"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 40
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "G3rGsDmEiy-p"
      },
      "source": [
        "# 3. Gaussian Naive Bayes"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "28v_IHJZihQu",
        "outputId": "f7d4e0ee-5471-4b66-94b3-edb1536218d5"
      },
      "source": [
        "from sklearn.naive_bayes import GaussianNB\r\n",
        "gnb = GaussianNB()\r\n",
        "cvscores_gnb = cross_val_score(gnb,X,y,cv=5)\r\n",
        "np.mean(cvscores_gnb)"
      ],
      "execution_count": 41,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "0.9533333333333334"
            ]
          },
          "metadata": {
            "tags": []
          },
          "execution_count": 41
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "nP7mPPRRjPY-"
      },
      "source": [
        ""
      ],
      "execution_count": 44,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "mnxN94iijVHE"
      },
      "source": [
        ""
      ],
      "execution_count": null,
      "outputs": []
    }
  ]
}